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1  Introduction 

Tlie  geometric  relation  between  3D  ohj(^cts  and  their 
views  is  a  key  component  for  various  applications  in  com¬ 
puter  vision,  image  coding,  and  animat  ion.  For  example, 
the  change  in  the  2D  i^rojection  of  a  moving  3D  object 
is  a  source  of  information  for  3D  reconstruction,  and  for 
visual  recognition  ap])licat ions  —  in  the  former  case  the 
retinal  changes  ]>roduce  the  cues  for  3D  recovery,  and  in 
the  latter  case  the  retinal  changes  provid('  the  cues  for 
factoring  out  the  effects  of  changing  viewing  positions  on 
the  recognition  i^rocess. 

The  introduction  of  affine  and  projective  tools  into 
the  field  of  com]niter  vision  have  brought  increased  ac¬ 
tivity  in  tlie  fields  of  structure  from  motion  and  recogni¬ 
tion  in  the  recent  few  years.  The  emerging  realization  is 
that  non-metric  information,  although  weaker  than  the 
information  provided  by  dejith  maps  and  rigid  camera 
geometries,  is  nonetheless  useful  in  the  sense  that  the 
framework  may  jirovide  simpler  algorithms,  camera  cali- 
liration  is  not  rec]uired,  more  freedom  in  picture-taking  is 
allowed  —  such  as  taking  pictures  of  pictures  of  objects 
—  and  there  is  no  need  to  make  a  distinction  between 
orthographic  and  perspective  projections.  The  list  of 
contributions  to  this  framework  include  (though  not  in¬ 
tended  to  be  complete)  [17,  2,  30,  12,  46,  47,  13,  26,  7,  32, 
34,  36,  25,  45,  29,  8,  10,  23,  31,  16,  15,  48]  —  and  relevant 
to  this  paper  are  the  work  described  in  [17,  7,  13,  34,  36]. 

The  material  introduced  so  far  in  the  literature,  con¬ 
cerning  3D  geometry  from  multiple  views,  focuses  on  the 
projective  framework  [7,  13,  36],  or  the  affine  framework. 
The  latter  requires  either  assuming  parallel  projection 
(cf.  [17,  46,  45,  30]),  or  certain  apriori  assumptions 
on  object  structure  (for  determining  the  location  of  the 
plane  at  infinity  [7,  28]),  or  assuming  purely  translational 
camera  motion  [24]  (see  also  later  in  the  text). 

In  this  paper,  we  propose  a  unified  framework  that 
includes  by  generalization  and  specialization  the  Eu¬ 
clidean,  projective  and  affine  frameworks.  The  frame¬ 
work,  we  call  '‘relative  affine”,  gives  rise  to  an  equation 
tliat  captures  most  of  the  spectrum  of  previous  results 
related  to  3D-from-2D  geometry,  and  introduces  new, 
extremely  simple,  algorithms  for  tlie  tasks  of  reconstruc¬ 
tion  from  multiple  views,  recognition  by  alignment,  and 
certain  image  coding  applications.  For  example,  previ¬ 
ous  results  in  these  areas  —  such  as  affine  structure  from 
orthographic  views,  projective  structure  from  perspec¬ 
tive  views,  the  use  of  the  plane  at  infinity  for  reconstruc¬ 
tion  (obtaining  alfine  structure  from  perspective  views), 
epijiolar-geometry  related  results,  reconstruction  under 
restricted  camera  motion  (the  case  of  pure  translation) 

- —  are  often  reduced  to  a  single-line  proof  under  the  new 
framework  (see  Corollaries  1  to  6). 

The  basic  idea  is  to  choose  a  representation  of  projec¬ 
tive  space  in  which  an  arbitrarily  chosen  reference  plane 
becomes  the  plane  at.  infinity.  We  then  show  that  under 
general,  uncalibrated,  camera  motion,  the  resulting  new 
representations  can  be  described  by  an  element  of  the 
affine  group  applied  to  the  initial  representation.  As  a  re¬ 
sult,  we  obtain  an  affine  invariant,  we  call  rclaiivc  affun 
sirucimr,  relative  to  the  initial  representation.  Via  sev¬ 
eral  corollaries  of  this  basic  result  we  show,  among  other 


things,  that  the  invariant  is  a  generalization  of  the  affine 
structure  under  parallel  projection  [17]  and  is  a  special¬ 
ization  of  the  projective  structure  (projective  structure 
can  be  described  as  a  ratio  of  two  relative  affine  strnc-  f 

tures).  Furthermore,  in  computational  t<u*ms  the  rela¬ 
tive  affine  result  re([uires  fewer  corres[)onding  points  and 
fewer  calculations  than  the  projective  framework,  and  is 
the  only  next  general  framework  after  projective  when 
working  with  perspective  views.  Parts  of  this  work,  as 
it  evolved,  have  been  presented  in  the  meetings  found  in 
[33,  38],  and  in  [27]. 


^^e  consider  object  space  to  be  the  three-dimensional 
projective  space  ,  and  imag('  sj:>ace  to  he  tin"  two- 
dimensional  projective  space  V~ .  An  object  (or  scene) 
is  modeled  l)y  a  set  of  points  and  let  0/  C  denote 
views  (arbitrary),  indexed  by  /,  of  the  object.  Given  two 
views  with  projection  centers  0,0'  G  respectively, 
the  epipoles  are  defined  as  the  intersection  of  the  line 
00'  with  both  image  planes.  A  set  of  numbers  defined 
up  to  scale  are  enclosed  by  brackets,  a  set  of  numbers 
enclosed  by  parent he.ses  define  a  vector  in  the  usual  way. 
Because  the  image  plane  is  finite,  we  can  a.ssign,  without 
loss  of  generality,  the  value  1  as  the  third  homogeneous 
coordinate  to  every  ohscj'ved  image  point.  Tliat  is,  if 
(x,  y)  are  the  observed  image  coordinates  of  some  point 
(with  respect  to  some  arbitrary  origin  —  say  the  geo¬ 
metric  center  of  the  image),  then  p  =  [j*,//,  1]  denotes 
the  homogeneous  coordinates  of  the  image  plane.  When 
only  two  views  V’l  discussed,  then  points  in  0t* 
are  denoted  by  p,  their  corresponding  points  in  V’l 
denoted  by  p' ,  and  the  epipoles  are  r  E  fi^id  v'  E  V’l- 
When  multiple  views  are  considered,  then  appropriate 
indecis  are  added  as  explained  later  in  tlu^  text.  The 
symbol  ^  denotes  equality  up  to  a  scale,  OLn  stands 
for  the  group  of  n  x  n  matrices,  and  PGLn  is  the  group 
defined  up  to  a  scale. 

A  camera  coordinate  system  is  an  Euclidean  frame 
describing  the  actual  internal  geometry  of  the  camera 
(position  of  the  image  plane  relative  to  the  camera  cen¬ 
ter).  If  p  —  1)"^  is  a  point  in  the  observed  coordi¬ 

nate  representation,  then  M~^p  represents  the  camera 
coordinates,  where  M  is  an  upper-diagonal  matrix  con¬ 
taining  the  internal  parameters  of  the  camera.  When  M 
is  known,  the  camera  is  said  to  l)e  internally  calibrated, 
and  when  M  =  I  the  camera  is  in  “standard”  calibra¬ 
tion  mod(\  The  material  pre.sented  in  this  paper  does 
not  require  further  details  of  internal  calibration  —  such 
as  its  decomposition  into  the  components  of  principle 
point,  image  plane  aspect  ratios  and  skew  —  only  the 
mere  existence  of  M  is  required  for  the  remaining  of  this 
paper. 


The  following  theorems  and  corollaries  introduce  our 
main  results  which  are  then  followed  by  explanatory  text. 

Theorem  1  (Relative  Affine  Structure  [33])  Lei  tt 

be  some  arbiirary  plane  and  lei  Pj  E  tt,  j  =  1,2,3 
projcciing  onio  pj ,  p'j  in  vieu's  il\,,  il'i^  respeeiively.  Lei 


3  Relative  Affine  Structure 


2  Notation 


1 


p 


Figure  1:  See  proof  of  Theorem  1. 


Po  G  'ipo  G  'ipi  be  projections  of  Po  ^  tt.  Let 

A  G  PGLs  be  a  homography  of  determined  by  the 
equations  Apj  =  p'-,  j  =  1,2,3,  and  Av  =  v' ,  scaled  to 
satisfy  the  equation  ^  Apo  A  v' .  Then,  for  any  point 
P  projecting  onto  p  £ 'if  o  and  p^  E  ipi,  ive  have 


p'  =  Ap  A  kv' 


(1) 


The  coefficient  k  =  k{p)  is  independent  of  'tpi,  i.e.,  is 
invariant  to  the  choice  of  the  second  view,  and  the  coor¬ 
dinates  of  P  are  [x,y,l,k]. 

Proof  We  assign  the 

coordinates  (1,  0,  0,  0),  (0, 1,0,  0),  (0, 0, 1,  0)  to  Pi,  P2,  P3, 
respectively.  Let  O  and  O'  be  the  projection  centers 
associated  with  the  views  ipo  and  'tpi,  respectively,  and 
let  their  coordinates  be  (0,  0,  0, 1),  (1, 1, 1, 1),  respectively 
(see  Figure  1).  This  choice  of  representation  is  always 
possible  because  the  two  cameras  are  part  of  By 
construction,  the  point  of  intersection  of  the  line  OO' 
with  TT  has  the  coordinates  (1, 1, 1, 0). 

Let  P  be  some  object  point  projecting  onto  p,p'. 
The  line  OP  intersects  tt  at  the  point  (a,;^,7,0).  The 
coordinates  a,/?, 7  can  be  recovered  by  projecting  the 
image  plane  onto  tt,  as  follows.  Given  the  epipoles 
V  E  'ipo  and  v'  E  ^1,  we  have  by  our  choice  of  co¬ 
ordinates  that  pi,P2,P3  and  v  are  projectively  (in  P^) 
mapped  onto  ei  —  (l,0,0),e2  =  (0,1,0),  63  =  (0,0,1) 
and  64  =  (1,1,1),  respectively.  Therefore,  there  exists 
a  unique  element  Ai  E  PGL3  that  satisfies  Aipj  =  ej , 
j  =  1,2,3,  and  Aiv  =  64.  Note  that  we  have  made 
a  choice  of  scale  by  setting  Aiv  to  64,  this  is  simply  for 
convenience  as  will  be  clear  later  on.  Let  ^ip  =  (a,  /?,  7). 

Similarly,  the  line  O'P  intersects  tt  at  (a', /?',  7',  0). 
Let  A2  E  PGLz  be  defined  by  yl2p'  =  Cj,  j  =  1,2,3, 
and  A^P  —  64.  Let  A^p'  =  (a',/?', 7').  Since  P  can 
be  described  as  a  linear  combination  of  two  points  along 
each  of  the  lines  OP,  and  O'P,  we  have  the  following 
equation: 


/  Of 
/? 
7 
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(  °  ^ 

a'  \ 

/ 

1  0 

/?' 
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—  5 

V  1 } 

0  y 
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from  which  it  readily  follows  that  k  =  s  (i.e.,  the 
transformation  between  the  two  representations  of 
is  affine).  Note  that  since  only  ratios  of  coordinates  are 
significant  in  P" ,  k  is  determined  up  to  a  uniform  scale, 
and  any  point  Po  ^  tt  can  be  used  to  set  a  mutual  scale 
for  all  views  —  by  setting  an  appropriate  scale  for  A,  for 
example.  The  value  of  k  can  easily  be  determined  from 
image  measurements  as  follows:  we  have 


a 
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7  / 


1 
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Multiply  both  sides  by  A2  ^  to  obtain  pp'  =  Ap  -f  kv', 
where  A  —  ^2  Note  that  A  E  PGLs  is  a  homogra¬ 
phy  between  the  two  image  planes,  due  to  tt,  determined 
by  p'-  =  Apj ,  j  =  1, 2,  3,  and  Av  ~  ^>'  (therefore,  can  be 
recovered  directly  without  going  through  Ai,A2)^  Simi¬ 
lar  proofs  that  a  homography  of  a  plane  can  be  recovered 
from  three  points  and  the  epipoles  are  found  in  [34,  29]. 
Since  k  is  determined  up  to  a  uniform  scale,  we  need  a 
fourth  correspondence  poypY  or  P,  be  scaled 

such  that  p^  ^  ApoAv' .  Finally,  [x,y,l,k]  are  the  homo¬ 
geneous  coordinates  representation  of  P,  and  the  3x4 
matrix  [A,  v']  is  a  camera  transformation  matrix  between 
the  two  views.  [] 

Theorem  2  (Further  Algebraic  Aspects  [27])  Lei 

the  coordinate  transform  from  P  —  zp  to  P^  —  z'p'  be 
described  by  P'  =  M'RM^^P  A  M'T,  where  R,T  are 
the  rotational  and  translational  parameters  of  the  rela¬ 
tive  camera  displacement,  and  M,  M'  are  the  internal 
camera  parameters.  Given  A^ie^k  defined  in  Theorem  1, 
let  n  be  the  unit  normal  to  the  plane  w,  and  the  (per¬ 
pendicular)  distance  of  the  origin  to  tt,  both  in  the  first 
camera  coordinate  frame.  Then, 

Tt)^ 

M'(R+——)M-\  (2) 

Utt 


where  Zo  is  the  depth  of  Po  ^  tt,  and  a  ~  a(p)  is  the 
ajfine  structure  of  P  in  the  case  of  parallel  projection 
(the  ratio  of  perpendicular  distances  of  P  and  Po  from 
w). 

Proof.  Let  P  be  at  the  intersection  of  the  ray  OP  with 
TT.  Then  P'  =  M'RM~^PAM'T.  Since  n^M-^P  = 
we  have:  P'  =  M'{RA  ^^)M~^P.  Since  the  term  in 
parentheses  describes  the  homography  due  to  tt,  we  have 
A  =  M'(P  +  —  which  is  the  generalization 

of  the  classical  motion  of  planes  in  the  calibrated  case 
[9,  43].  For  the  point  P  we  have: 


-p'  =  M'RMp+-M'T 

Z  Z 

'1  n'^(M-V) 


=  Ap  A 
=  Ap  + 


Z  djTT 


M'T 


Figure  2:  Affine  struct nre  under  paralkd  projection  is 
This  can  be  seen  from  the  similarity  of  trapezoids  followed 
by  tlie  similarity  of  triangles: 


Let  dp  =  dT^~n~^  (M  ^  P)  the  (perpendicular)  distance 
from  P  to  TT.  We  thus  have 

_  'o 


where  c/^,  is  the  (perpendicular)  distance  of  Po  from  tt 
(see  Figure  3-a),  Finally,  note  that  the  ratio  o  =  dp /do 
of  the  distances  of  P  and  Po  from  tt  is  the  affine  structure 
when  the  projection  is  parallel  (see  Figure  2).  [] 

Corollary  1  Relative  affine  sirucinre  k  approaches 
affine  sirucinre  under  parallel  projection  irhen  O  goes 
io  infinity,  i.c.,  k - ^  a  when  O  — "  oc. 

Proof.  When  O  — "  oc,  then  ^  oc .  Thus  k  = 

^  2).  [] 

Corollary  2  When  the  plane  tt  is  at  infinity  (with  re¬ 
spect  to  the  camera  coordinate  frame),  then  relative 
affine  structure  k  is  affine  structure  under  perspective 
k  zz  Zo/z,  A  =  RM~^ ,  and,  if  in  addition,  the  cam¬ 
eras  are  internally  calibrated  as  M  =  M'  —  I,  then 
A  =  R. 

Proof.  When  tt  is  at  infinity,  then  dp.ef  — "  oc. 
Thus  k  =  ^  Also,  d^  — "  oc,  thus  A  — ^ 

APRM-f  (see  Figure  3-b)[] 

Corollary  3  (Pure  Ti*anslation)  In  the  case  of  pure 
translational  motion  of  the  camera,  and  when  the  inter¬ 
nal  camera  parameters  remain  fired,  i.e.,  M  =  AP ,  then 
the  selection  of  the  identity  homography  A  ~  I  (in  Equa¬ 
tion  1)  leads  to  an  affine  reconstruction  of  the  scene  (i.e., 
the  identity  matrix  is  the  homography  due  to  the  plane 
at  infinity).  In  other  words,  the  scalar  k  in 

p'  =  p  -f  ktd 

is  invariant  under  all  subsequent  camera  motions  that 
leave  the  internal  pai^ameters  unchanged  and  consist  of 
only  translation  of  the  camera  center.  The  coordinate s 
[.r,?/,  1,A:]  arc  related  to  the  camera  coordinate  frame  by 
a n  cle m ent  of  the  affi n e  g ro up . 


Proof.  Follows  immediately  from  (k)rollary  2:  the  ho¬ 
mography  due  to  the  plane  at  infinity  is  A  = 

Hence,  A  =  7  when  Al  —  AT  and  R  —  1  ([>ur('  transla¬ 
tional  motion).  [] 

Corollary  4  The  projective  structure  of  the  scene  can 
be  described  as  the  ratio  of  two  relative  eiffine  structures 
each  with  respect  to  a  distinct  reference  plane  tt,  tt,  re¬ 
spectively,  which  in  turn  can  be  described  as  the  ratio  of 
affine  structures  under  parallel  projection  with  respect  to 
the  same  two  planes. 

Proof  Let  Av  and  he  the  relative  affine  structures 
with  respect  to  planes  tt  and  tt,  respectively.  From  I'heo- 

rem  2  we  have  that  and  kj^  —  i-atio 

k^^/kje  removes  the  dependence  on  the  projection  cent('r 
O  [z/zo  cancels  out)  and  is  therefore  a  projective  invari¬ 
ant  (see  Figure  4).  This  projective  invariant  is  also  the 
ratio  of  cross-ratios  of  the  rays  Oi^  and  OPo  with  their 
intersections  with  the  two  planes  tt  and  tt,  which  was  in¬ 
troduced  in  [34,  36]  as  ‘‘projective  deptlf'.  It  is  also  the 
ratio  of  two  affine  structures  under  paralhd  |)rojection 
(recall  that  dp/efy  is  the  affine  structure;  see  Figure  2). 

D 

Corollary  5  The  '^essentiar'  matrix  E  =  [v']R.  is  a  par¬ 
ticular  case  of  a  generalized  matrix  F  —  [P]A.  The  ma¬ 
trix  F,  referred  to  as  ffund  ament  al"  matrix  in  [7],  is 
unique  and  does  not  depend  on  the  plane  tt.  Further¬ 
more,  Fv  =  0  and  F^  =  0. 


Proof  Let  p  E  ^  V  i  corresponding  points, 

and  let  1,1'  be  their  corresponding  epipolar  lines,  i.e., 
I  =  p  X  V  and  I'  ^  ;/  x  v' .  Since  lines  are  projective 
invariants,  then  any  point  along  /  is  mapped  by  A  to 
some  point  along  /'.  Thus,  I'  ~  v'  x  Ap,  and  because  p'  is 
incident  to  /',  we  have  (v'  x  Ap)  =  0,  or  equivalently: 

[v']Ap  =  0,  or  ==  0,  where  F  =  From 

C'Orollary  2,  A  =  7?  in  the  special  case  where  the  plane 
TT  is  at  infinity  and  the  cameras  are  internally  calibrated 
as  A I  —  AP  =  I,  thus  F  =  is  a  special  case  of 

F.  The  uniqueness  of  F  follows  from  substitution  of 
A  with  Ecpiation  2  and  noting  that  [v']T  =  0,  thus  F  — 
[v']A[' RM”^ .  Finally,  since  Ar  ^  r',  [r'jAe  ^  [e']r'  ==  0, 
thus  Fv  =  0,  and  0,  thus 

F^P  =  0.  Q 

Corollary  6  (stream  of  views)  Given  m  >  2  views, 
let  Aj  and  Vj  be  the  homographies  of  tt  and  the  epipoles, 
respectively,  from  view  po  lo  vieiv  pj,  and  let  the  views 
of  an  object  point  P  be  pj  where  the  index  j  ranges  over 
the  m  views.  Then,  the  least  squares  solution  for  k  is 
given  by 


HjiPj  X  Cj  fe^j  Po  X  pj) 
E;IIP.  Xt-'  IP 


(3) 


Proof.  This  is  simply  a  calculation  ba.sed  on  the  observa¬ 
tion  that  given  a  general  equation  of  the  type  a  ~  b-]-kc, 
then  by  performing  a  cross  product  with  a  on  both  sides 
we  get:  k(a  x  c)  —  b  x  a.  The  value  of  k  can  be  found 
using  the  normal  equations  (treating  k  as  a  vector  of 
dimension  1): 

^  _  (b  X  a)'^  (a  X  c) 

"IFxTip-  • 
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(b) 


Figure  3:  (a)  Relative  affine  Structure:  k  =  (b)  Affine  structure  under  perspective  (when  tt  is  at  infinity).  Note  that 

the  rays  OP  and  O'p  are  parallel,  thus  the  homography  is  the  rotational  component  of  motion. 


Figure  4:  Projective-depth  [34,  36]  is  the  ratio  of  two 
relative  affine  structures,  each  with  respect  to  a  distinct 
reference  plane,  which  is  also  the  ratio  of  two  affine  struc¬ 
tures  (see  Corollary  4  for  more  details). 

Similarly,  if  in  addition  we  have  a'  =  6'  -f  kc'  ^  then  the 
overall  least  squares  solution  is  given  by 

(b  X  a)^(a  X  c)  H-  (6'  x  x  P)  p, 

||a  X  c||2  +  ||a'  X  c'||2  *L1 

3.1  Explanatory  Text 

The  key  idea  in  Theorem  1  was  to  use  both  camera  cen¬ 
ters  as  part  of  the  reference  frame  in  order  to  show  that 
the  transformation  between  an  arbitrary  representation 
IZo  of  space  as  seen  from  the  first  camera  and  the  repre¬ 
sentation  IZ  as  seen  from  any  other  camera  position,  can 
be  described  by  an  element  of  the  affine  group.  In  other 
words,  we  have  chosen  an  arbitrary  plane  tt  and  made  a 
choice  of  representation  IZo  in  which  tt  is  the  plane  at  in¬ 
finity  (i.e.,  TT  was  mapped  to  infinity  —  not  an  unfamiliar 
trick,  especially  in  computer  graphics).  The  representa¬ 
tion  IZo  is  associated  with  [x,y,l,k]  where  k  vanishes 


for  all  points  coplan ar  with  tt,  which  means  that  tt  is  the 
plane  at  infinity  under  the  representation  IZo-  What  was 
left  to  show  is  that  tt  remains  the  plane  at  infinity  under 
all  subsequent  camera  transformations,  and  therefore  k 
is  an  affine  invariant.  Because  k  is  invariant  relative 
to  the  representation  IZo  we  named  it  “relative  affine 
structure”;  this  should  not  be  confused  with  the  term 
“relative  invariants”  used  in  classical  invariant  theory 
(invariants  multiplied  by  a  power  of  the  transformation 
determinant,  as  opposed  to  “absolute  invariants”). 

In  practical  terms,  the  difference  between  a  full  pro¬ 
jective  framework  (like  in  [7,  13,  36])  and  the  relative 
affine  framework  can  be  described  as  follows.  In  a  full 
projective  framework,  if  we  denote  by  /  the  invariance 
function  acting  on  a  pair  of  views  indexed  by  a  fixed  set 
of  five  corresponding  points,  then  is  fixed  for 

all  i,  j.  In  a  relative  affine  framework,  if  we  denote  fo 
as  the  invariance  function  acting  on  a  fixed  view  and 
an  arbitrary  view  rpi  and  indexed  by  a  fixed  set  of  four 
corresponding  points,  then  fo{i^o,'^i)  is  fixed  for  all  i. 

The  remaining  theorem  2  and  corollaries  put  the  rela¬ 
tive  affine  framework  within  the  familiar  context  of  affine 
structure  under  parallel  and  perspective  projections.  Eu¬ 
clidean  structure  and  projective  structure.  The  homog¬ 
raphy  A  due  to  the  plane  tt  was  described  as  a  product 
of  the  rigid  camera  motion  parameters,  the  parameters 
of  TT,  and  the  internal  camera  parameters  of  both  cam¬ 
eras.  This  result  is  a  natural  extension  of  the  classical 
motion  of  planes  found  in  [9,  43],  and  also  in  [22].  The 
relative  affine  structure  k  was  described  as  a  product  of 
the  affine  structure  under  parallel  projection  and  a  term 
that  contains  the  location  of  the  camera  center  of  the 
reference  view.  Geometrically,  k  is  the  product  of  two 
ratios,  the  first  being  the  ratio  of  the  perpendicular^  dis¬ 
tance  of  a  point  P  to  the  plane  tt  and  the  depth  to  the 


^Note  that  the  distance  can  be  measured  along  any  fixed 
direction.  We  use  the  perpendicular  distance  because  it  is 
the  most  natural  way  of  describing  the  distance  between  a 
point  and  a  plane. 


rcforciKT  camera,  and  tlie  second  ratio  is  of  the  same 
form  hut  ap])lied  to  a  fixed  point  Po  which  is  used  to 
set  a  uniform  scale  to  the  system.  Tlierefore,  when  the 
de])th  goes  to  infinity  (projection  approaches  orthogra¬ 
phy),  then  k  aj^proaclies  tlie  ratio  of  the  perpendicular 
distances  of  P  from  tt  and  the  j)erpendicular  distance  of 
Po  from  TT  —  which  is  precisely  the  affine  structure  under 
l)arallel  projection  [17].  Thus,  relative  affine  structure  is 
a  generalization  in  the  sense  of  including  the  center  of 
projection  of  an  arbitrary  camera,  and  when  the  cam¬ 
era  center  goes  to  infinity  we  obtain  an  affine  structure 
which  l)ecomes  indej')enden(  of  the  reference  camera. 

Anot  her  specialization  of  relative  affine  structure  was 
shown  in  Corollary  2  by  considering  the  case  when  tt  is 
at  infinity  with  respect  to  our  Euclidean  frame  (i.e.,  re¬ 
ally  at  infinity).  In  that  case  k  is  simjdy  inverse  depth 
(uj)  to  a  uniform  scale  factor),  and  the  homography  A 
is  the  familiar  rotational  component  of  camera  motion 
(orthogonal  matrix  R)  in  the  case  of  calibrated  cameras, 
or  a  jM’oduct  of  R  with  the  internal  calibration  j')aram- 
eters.  In  other  words,  when  tt  is  at  infinity  also  with 
respect  to  our  camera  coordinate  frame,  then  relative 
affine  becomes  affine  (the  plane  at  infinity  is  preserved 
under  all  representations  [7]).  Notice  that  the  rays  to¬ 
wards  t.he  plane  at  infinity  are  parallel  across  the  two 
cameras  (see  Figure  3-b).  Thus,  there  exists  a  rotation 
matrix  that  aligns  the  two  bundles  of  rays,  and  following 
this  line  of  argument,  the  same  rotation  matrix  aligns  the 
epipolar  lines  (scaled  apj)ropriately)  because  orthogonal 
matrices  commute  with  cross  products.  We  have  there¬ 
fore  the  algorithm  of  [18]  for  determining  the  rotational 
component  of  standard  calibrated  camera  motion,  given 
the  epi])oles.  In  practice,  of  course,  we  cannot  recover 
the  homography  due  to  the  plane  at  infinity  unless  we 
are  given  prior  information  on  the  nature  of  the  scene 
structure  [28],  or  the  camera  motion  is  purely  transla¬ 
tional  ([2d]  and  Corollary  3).  Thus  in  the  general  case, 
we  can  realize  either  the  relative  affine  framework  or  the 
p  i‘o j  e c t.  i  ve  f r am  e wo  r  k . 

In  Corollary  3  we  address  a  particular  case  in  which 
we  ca77  recover  tlie  homography  due  to  the  plane  at  infin¬ 
ity,  hence  recover  the  affine  structure  of  the  scene.  This 
is  the  case  where  the  camera  motion  is  purely  transla¬ 
tional  and  the  internal  camera  parameters  remain  fixed 
(i.e.,  we  use  the  same  camera  for  all  views).  This  case 
was  addre.ssed  in  [24]  by  using  clever  and  elaborate  geo¬ 
metric  constructions.  The  basic  idea  in  [24]  is  that  under 
pure  translation  of  a  calibrated  camera,  certain  lines  and 
points  on  the  plane  at  infinity  are  easily  constructed  in 
the  image  plane.  A  line  and  a  point  from  the  plane  at 
infinity  are  tlien  used  as  auxiliaries  for  recovering  the 
affine  coordinates  of  the  scene  (with  respect  to  a  frame 
of  four  object  i)oints). 

The  relative  affine  framework  provides  a  single-line 
proof  of  the  main  result  of  [24],  and  Furthermore,  pro¬ 
vides  an  extremely  obvious  algorithm  for  reconstruction 
of  affine  structure  from  a  purely  translating  camera  with 
fixed  internal  parameters,  as  follows.  The  epipole  is 
the  focus  of  expansion  and  is  determined  from  t  wo  cor¬ 
responding  points  (e'  =  (/>/  x  p')  x  (pj  x  pj),  for  some 
/jj).  Given  corresponding  points  p,;/  in  the  two  views. 


the  coordinates  (,r,p,  A*),  where  k  satisfies  ]/  ~  p It’v' , 
are  related  to  the  Euclidean  coordinates  (with  resi)ect  to 
a  camera  coordinate  frame)  by  an  element  of  the  affiiu' 
group.  The  scalar  k  is  determined  up  to  scah\  thus  one 
of  the  points,  say  should  determine  the  scab'  by  scal¬ 
ing  v'  to  satisfy  p|,  =  p,,  4-  v*  (note  that  p,,  can  coincide' 
with  one  of  the  points,  p/  or  pj,  used  for  det('rmining  r'). 
In  case  we  would  like  to  determine  the  affine  coordinates 
with  resj^ect  to  four  object  points  Pi,...,/h,  we  simply 
assign  the  standard  coordinates  (0,  0,  0),  ( 1, 0,  0),  (0, 1,0) 
and  (0.0,  1)  to  those  points,  and  solve  for  the  3D  affine 
transformation  that  maps  i  =  1,...,4,  onto 

the  standard  coordinates  (the  mapping  contains  12  pa¬ 
rameters.  and  each  of  the  four  points  determines  three 
linear  equations). 

To  conclude  the  implications  of  (Corollary  3,  we  ob¬ 
serve  that  given  the  epipole  r',  we  need  only  one  more 
point  match  (for  setting  a  mutual  scale)  in  ord(M‘  to  de¬ 
termine  affine  structure.  This  is  obvious  because  the 
epi])ole  is  the  translational  component  of  camera  mo¬ 
tion,  and  since  this  is  the  only  motion  we  a.ssume  to 
have,  the  structure  of  the  scene  should  follow  without 
additional  information.  This  case  is  very  similar  to  the 
classic  paradigm  of  stereopsis:  insteacl  of  assuming  that 
epipolar  lines  are  horizontal,  we  recover  the  epipoh'  (two 
point  matches  are  sufficient),  and  instead  of  assuming 
a  calibrated  camera  we  a.ssume  an  uncalibrated  camera 
whose  internal  parameters  remain  fixed,  and  in  turn,  in¬ 
stead  of  recovering  depth  we  can  recover  at  most  the 
affine  structure  of  the  scene.  Finally,  the  result  that 
the  homography  due  to  the  plane  at  infinity  is  the  id(Ui- 
tity  matrix  can  be  derived  by  geometric  grounds  as  well. 
Points  and  lines  from  the  plane  at  infinity  are  fixed  points 
of  the  homography;  with  an  affine  frame  of  four  points 
we  can  observe  four  fixed  points,  and  thus,  a  homog¬ 
raphy  with  four  fixed  points  is  necessarily  the  identity 
matrix. 

The  connection  between  the  relative  affine  structure 
and  projective  structure  was  shown  in  Corollary  4.  Pro¬ 
jective  invariants  are  nece.ssarily  described  with  reference 
to  five  scene  points  [7],  or  equivalently,  with  referenc(' 
to  two  planes  and  a  point  laying  outside  of  tlieui  both 
[36.  34].  Corollary  4  shows  that  by  taking  the  ratio  of 
two  relative  affine  structures,  each  relativT  to  a  differ¬ 
ent  reference  plane,  then  the  dependence  on  the  camera 
center  (the  term  z^/z)  drops  and  we  are  left  with  the 
projective  invariant  described  in  [36],  which  is  the  ratio 
of  the  perpendicular  distance  of  a  point  to  two  planes 
(up  to  a  uniform  scale  factor). 

Corollary  5  unifies  previous  results  on  the  nature  of 
what  is  known  by  now  as  the  'Tundamental  matrix" 
[7,  8].  It  is  shown,  that  for  any  plane  tt  and  its  cor¬ 
responding  homography  A  we  have  F  =  [^’^]A.  First, 
we  see  that  given  a  homography,  the  e[)ipoh'  r'  follows 
by  having  two  corresponding  points  coming  from  scene 
points  not  coplanar  with  tt  —  an  oKservation  that  was 
originally  made  by  [18].  Second,  F  is  fixed,  regardh'ss 
of  the  choice  of  tt,  which  was  shown  by  using  the  result 
of  Theorem  2.  As  a  particular  ca.se,  the  product  [v']R 
remains  fixed  if  we  add  to  R  a  element  that  vanishes 
as  a  product  with  [r']  —  an  observation  that  was  made 


previously  by  [13].  Thirdly,  the  ''essential”  matrix  [19], 
E  =  [v^]R,  is  shown  to  be  a  specialization  of  F  in  the 
case  TT  is  at  infinity  with  respect  to  the  world  coordi¬ 
nate  frame  and  the  cameras  are  internally  calibrated  as 
M  =  =  L 

Finally,  Corollary  6  provides  a  practical  formula  for 
obtaining  a  least-squares  estimation  of  relative  affine 
structure  which  also  applies  for  the  case  where  a  stream 
of  views  is  available  —  in  the  spirit  of  [46,  42,  23,  41 ,  1,5]. 
In  the  next  section  we  apply  these  results  to  obtain  a 
simple  algorithm  for  relative  affine  reconstruction  from 
multiple  m  >  2  views  and  multiple  points. 

3.2  Application  I:  Reconstruction  from  a 
Stream  of  Views 

Taken  together,  the  results  above  demonstrate  the  abil¬ 
ity  to  compute  relative  affine  structure  using  many 
points  over  many  views  in  a  least  squares  manner.  At 
minimum  we  need  two  views  and  four  corresponding 
points  and  the  corresponding  epipoles  to  recover  k  for 
all  other  points  of  the  scene  whose  projections  onto  the 
two  views  are  given.  Let  pij,  i  =  0, ...,  n  and  j  =  0, ...,  m 
denote  the  i’th  image  point  on  frame  j.  Let  Aj  denote 
the  homography  from  frame  0  to  frame  j,  vj ,  Vj  the  corre¬ 
sponding  epipoles  such  that  AjVj  ^  and  let  ki  denote 
the  relative  affine  structure  of  point  i.  We  follow  these 
steps: 

1.  Compute  epipoles 

Vj^Vj  using  the  relation  PijFjPio  —  0,  over  all  i. 
Eight  corresponding  points  (frame  0  and  frame  j) 
are  needed  for  a  linear  solution,  and  a  least-squares 
solution  is  possible  if  more  points  are  available.  In 
practice  the  best  results  were  obtained  using  the 
non-linear  algorithm  of  [21].  The  epipoles  follow  by 
FjVj  —  0  and  =0  [7].  The  latter  readily  fol¬ 
lows  from  Corollary  5  as  [Fj]AjVj  ^  =  0  and 

2.  Compute  Aj  from  the  equations  AjPio  =  Pij ,  i  — 
1,2,3,  and  AjVj  =  Vj.  This  leads  to  a  linear  set 
of  eight  equations  for  solving  for  Aj  up  to  a  scale. 
A  least  squares  solution  is  available  from  the  equa¬ 
tion  pij  [vj]  AjPio  =  0  for  all  additional  points  (Corol¬ 
lary  5).  Scale  Aj  to  satisfy  Poj  —  AjPoo  +  Fj . 

3.  Relative  affine  structure  ki  is  given  by  (3). 

3.3  Application  II:  Recognition  by  Alignment 

The  relative  affine  invariance  relation,  captured  by  The¬ 
orem  1,  can  be  used  for  visual  recognition  by  alignment 
([44,  14],  and  references  therein).  In  other  words,  the 
invariance  of  k  can  be  used  to  "re-project”  the  object 
onto  any  third  view  p",  as  follows.  Given  two  "model” 
views  in  full  correspondence  pi  ^ ^  p^,  i  ~  1,  ...,n,  we 
recover  the  epipoles  and  homography  A  from  Api  =  p  • , 
i  =  1,2,3,  and  Av  =  v' .  Then  the  corresponding  points 
p'-'  in  any  third  view  satisfy  p"  =  5p  -j-  kv^\  for  some 
matrix  B  and  epipole  v" ,  One  can  solve  for  B  and  v" 
by  observing  six  corresponding  points  between  the  first 
and  third  view.  Once  are  recovered,  we  can  find 

the  estimated  location  of  p-'  for  the  remaining  points 


Pi^  i  z=  7,  ...,n,  by  first  solving  for  ki  from  the  equation 
p-  =  Api  +  kiv' ,  and  then  substituting  the  result  in  the 
equation  p-'  =  Bpi  Akit^".  Recognition  is  achieved  if  the 
distance  between  p'/  and  f/,  i  =  7,  ...,n,  is  sufficiently 
small.  Other  methods  for  achieving  reprojection  include 
the  epipolar  intersection  method  (cf.  [26,  6,  11]),  or  by 
using  projective  structure  instead  of  the  relative  affine 
structure  [34,  36].  In  all  the  above  methods  the  epipolar 
geometry  plays  a  key  and  preconditioned  role.  More  di¬ 
rect  methods,  that  do  not  require  the  epipolar  geometry 
can  be  found  in  [35,  37]. 

3.4  Application  III:  Image  Coding 

The  re-projection  paradigm,  described  in  the  previous 
section,  can  serve  as  a  principle  for  model-based  im¬ 
age  compression.  In  a  sender/receiver  mode,  the  sender 
computes  the  relative  affine  structure  between  two  ex¬ 
treme  views  of  a  sequence,  and  sends  the  first  view, 
the  relative  affine  scalars,  and  the  homographies  and 
epipoles  between  the  first  frame  and  all  the  intermediate 
frames.  The  intermediate  frames  can  be  reconstructed 
by  re-projection.  Alternatively,  the  sender  send  the  two 
extreme  views  and  the  homographies  and  epipoles  be¬ 
tween  the  first  and  all  other  intermediate  views.  The 
receiver  recovers  the  correspondence  field  between  the 
two  extreme  views,  and  then  synthesizes  the  remaining 
views  from  the  received  parameters  of  homographies  and 
epipoles.  In  case  the  distance  between  the  two  extreme 
views  is  "moderate”,  we  found  that  optical  flow  tech¬ 
niques  can  be  useful  for  the  stage  of  obtaining  the  corre¬ 
spondence  field  between  the  views.  Experiments  can  be 
found  later  in  the  text,  and  more  detailed  experiments 
concerning  the  use  of  optical  flow  in  full  registration  of 
images  for  purposes  of  model-based  image  compression 
can  be  found  in  [4]. 

4  Experimental  Results 

The  following  experiments  were  conducted  to  illustrate 
the  applications  that  arise  from  the  relative  affine  frame¬ 
work  (reconstruction,  recognition  by  alignment,  and  im¬ 
age  coding)  and  to  test  the  algorithms  on  real  data.  The 
performance  under  real  imaging  situations  is  interesting, 
in  particular,  because  of  the  presence  of  deviations  from 
the  pin-hole  camera  model  (radial  distortions,  decenter¬ 
ing,  and  other  effects),  and  due  to  errors  in  obtaining 
image  correspondences. 

Fig.  5  shows  four  views,  out  of  a  sequence  of  ten  views, 
of  the  object  we  selected  for  experiments.  The  object  is 
a  sneaker  with  added  texture  to  facilitate  the  correspon¬ 
dence  process.  This  object  was  chosen  because  of  its 
complexity,  i.e.,  it  has  a  shape  of  a  natural  object  and 
cannot  easily  be  described  parameterically  (as  a  collec¬ 
tion  of  planes  or  algebraic  surfaces).  A  set  of  thirty-four 
points  were  manually  selected  on  one  of  the  frames,  re¬ 
ferred  to  as  the  first  frame,  and  their  correspondences 
were  automatically  obtained  along  all  other  frames  used 
in  this  experiment  (corresponding  points  are  marked  by 
overlapping  squares  in  Fig.  5).  The  correspondence  pro¬ 
cess  is  based  on  an  implementation  of  a  coarse-to-fine 
optical-flow  algorithm  based  on  [20]  and  described  in  [3]. 


(c)  (cl) 


Figure  5:  Four  views,  out  of  a  sequence  of  ten  views,  of  a  sneaker.  The  frames  shown  here  are  the  first,  second,  fiftli  and 
tenth  of  the  sequence  (top-bottom,  left-to-right ).  The  overlayed  squares  mark  the  corresponding  points  that  were  tracked  and 
subsequently  used  for  our  experiments. 
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Figure  6:  Results  of  3D  reconstruction  of  the  collection  of  sample  points,  (a)  Frontal  view  (aligned  with  the  first  frame  of 
the  sneaker).  The  two  bottom  displays  show  a  side  view  of  the  sample,  (b)  Result  of  recovering  structure  between  the  first 
and  tenth  frame  (large  base-line);  (c)  Result  of  recovery  between  the  first  and  second  frames  (small  base-line). 


(a)  (b) 

Figure  7:  Results  of  re-projection  onto  the  tenth  frame.  Epipoles  were  recovered  nsing  the  ground  plane  homography  (see 
text).  The  re- projected  points  are  marked  by  crosses,  and  should  be  in  the  center  of  their  corresponding  square  for  accurate 
re- projection,  (a)  Structure  was  recovered  between  the  first  and  fifth  frames,  then  re-projected  onto  the  tenth  frame  (large 
base-line).  Average  error  is  1.1  pixels  with  std  of  0.98.  (b)  Structure  was  recovered  between  the  first  and  second  frames  (small 
base-line  situation)  and  then  re-projected  onto  the  tenth  frame.  Average  error  is  7.81  pixels  with  std  of  6.5. 
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Ej)! polos  w<'ro  recovorod  hy  either  one  of  the  following 
two  methods.  First,  by  using  the  four  ground  points  to 
recover  the  homography  A.  and  then  by  Corollary  5  to 
compute  the  ej)ipoles  using  all  the  remaining  points  in 
a  least  squares  manner.  Second,  using  the  non-linear 
algorithm  proposed  l)y  [21].  The  two  methods  gave  rise 
to  very  similar  results  for  reconstruction,  and  slightly 
different  results  for  re-]>rojection  (see  later). 

In  the  reconstruction  paradigm,  we  recovered  relative 
affine  structure  from  two  views  and  multiple  views.  In 
the  two-view  case  we  used  either  a  small  l:)ase-line  (the 
first  two  views  of  the  sequence)  or  a  large  base-line  (the 
first  and  last  views  of  the  sequence).  In  the  multiple 
view  case,  we  used  all  ten  views  of  the  sequence  (Corol¬ 
lary  6).  The  transformation  to  Euclidean  coordinates 
was  done  for  jnirj^oses  of  display  by  assuming  that  the 
ground  plane  is  parallel  to  the  image  plane  (it  actually 
is  not)  and  that,  the  camera  is  calibrated  (there  was  no 
calibration  attempt  made). 

The  3D  coordinates  are  shown  in  Fig.  6.  Display  (a) 
shows  a  frontal  view  (in  order  to  visually  align  the  dis¬ 
play  with  the  image  of  the  sneaker).  Other  displays  show 
a  side  view  of  the  reconstructed  sneaker  under  the  fol¬ 
lowing  experimental  situations.  Display  (b)  is  due  to 
reconstruction  under  large  base-line  situation  (the  two 
methods  for  obtaining  the  epipoles  produced  very  simi¬ 
lar  results;  the  multiple-view  case  produced  very  similar 
results  as  well).  The  side  view  illustrates  the  robustness 
of  tlie  reconstruction  process,  as  it  was  obtained  by  rotat¬ 
ing  the  object  around  a  different  axis  than  the  one  used 
for  capturing  the  images.  Disjday  (c)  is  due  to  recon¬ 
struction  under  small  base-line  situation  (both  methods 
for  obtaining  the  epipoles  produced  very  similar  results). 
The  quality  of  reconstruction  in  the  latter  case  is  not  as 
good  as  in  the  former,  as  slioiild  be  expected.  Never¬ 
theless,  the  system  does  not  totally  brake-down  under 
relatively  small  base-line  situations  and  produces  a  rea¬ 
sonable  result  under  these  circumstances. 

In  the  re-projection  application  (see  Section  3.3),  rel¬ 
ative  affine  structure  was  recovered  using  the  first  and 
in-between  views,  and  re-projected  onto  the  last  view  of 
the  sequence.  Note  that  this  is  an  extrapolation  exam¬ 
ple,  thereby  performance  is  expected  to  be  poorer  than 
int  er])olation  examples,  i.e.,  when  the  re-projected  view 
is  in-between  the  model  views.  The  interpolation  case 
will  be  discussed  in  the  next  section,  where  relevance  to 
image  coding  applications  is  argued  for. 

In  general,  the  performance  was  better  when  the 
ground  plane  was  used  for  recovering  the  epipoles.  When 
the  intermediate  view  was  the  fifth  in  the  sequence 
(Fig.  5,  display  (c)),  the  average  error  in  re-projection 
was  1.1  pixels  (with  standard  deviation  of  0.98  pixels). 
When  the  intermediate  view  was  the  second  frame  in  the 
sequence  (Fig.  5,  display  (b)),  the  results  were  poorer 
(due  to  small  base-line  and  large  extrapolation)  with  av¬ 
erage  error  of  7.81  pixels  (standard  deviation  of  6.5). 
These  two  cases  are  displayed  in  Fig.  7.  The  re-projected 
]:)oints  are  represented  by  crosses  overlayed  on  the  last 
frame  (the  re-projected  view). 

When  the  second  method  for  computing  tlu'  epipoles 
was  used  (more  general,  but  generally  less  accurate),  the 


results  were  as  follows.  With  the  fifth  frame,  the  aver¬ 
age  error  was  1.62  pixels  (standard  deviation  of  1.2);  and 
with  the  second  frame  (small  l)ase-line  situation)  tlu'  av¬ 
erage  error  was  13.87  pixels  (standard  deviation  of  9.47). 
These  two  cases  are  displayed  in  Fig.  8.  Not('  that  b('- 
cause  all  points  were  used  for  recovering  the  epipoh's,  th(' 
re-projection  performance,  only  indicates  the  level  of  ac¬ 
curacy  one  can  obtain  when  all  the  information  is  being 
used.  In  practice  we  would  like  to  use  much  finver  points 
from  the  re-projected  view,  and  therefore,  re-projection 
methods  that  avoid  the  epipoles  all  together  would  Ix' 
preferred  —  an  example  of  such  a  method  can  be  found 
in  [35.  37]. 

For  the  imag(^  coding  paradigm  (see  Section  3.4),  rel¬ 
ative  affine  structure  of  the  34  sam})l('  points  were  com¬ 
puted  between  the  first  and  last  frame  of  the  ten  frame 
sequence  (displays  (a)  and  (d)  in  Fig.  5).  Display  (a) 
in  Fig.  9  shows  a  graph  of  the  average  re-projection  er¬ 
ror  for  all  the  intermediate  frames  (from  second  to  ninth 
frames).  Display  (b)  shows  the  relative  error  normalized 
by  the  distance  between  corresponding  points  across  tlu' 
sequence.  We  see  that  the  relative  error  generally  goes 
down  as  the  re-projected  frame  is  farther  from  the  first 
frame  (increase  of  base-line).  In  all  frames,  the  average 
error  is  less  than  1  pixel,  indicating  a  relatively  rol)ust 
performance  in  practice. 


The  framework  of  "relative  affine’’  was  introduced  and 
shown  to  be  general  and  sharper  thai^  the  projective  re¬ 
sults  for  purposes  of  3D  reconstruction  from  multiple 
views  and  for  the  task  of  recognition  by  alignment.  One 
of  the  key  ideas  in  this  work  is  to  define  and  recover 
an  invariant  that  stands  in  the  middh"  ground  between 
affine  and  projective.  The  middle  ground  is  achieved 
by  having  the  camera  center  of  one  arbitrary  view  as 
part  of  the  projective  reference  frame  (of  five  points), 
thus  obtaining  the  first  result  descri[>ed  in  Theorem  1 
(originally  in  [33]).  The  result  simply  states  that  un¬ 
der  general  uncalibrated  camera  motion,  the  sharpest 
result  we  can  obtain  is  that  all  the  degrees  of  freedom 
are  captured  by  four  points  (thus  the  scene  may  un¬ 
dergo  at  most  3D  affine  transformations)  and  a  single 
unknown  projective  transformation  (from  the  arbitrary 
viewer- centered  representation  to  the  camera  coor¬ 
dinate  frame).  The  invariants  that  are  obtained  in  this 
way  are  viewer-centered  since  the  camera  center  is  part  of 
the  reference  frame  and  are  called  ''relative'  affine  struc¬ 
ture”.  This  statement,  that  all  the  available  degrees  of 
freedom  are  captured  by  four  points  and  one  projective 
transformation,  was  also  recently  presented  in  [40]  using 
different  notations  and  tools  than  those  used  here  and  in 
[33,  38]. 

Tins  "middle  ground”  approach  has  several  advan¬ 
tages.  First,  the  results  are  sharper  than  a  full  projec¬ 
tive  reconstruction  approach  ([7,  13])  where  five  scene 
points  are  needed.  The  increased  sharpness  translates 
to  a  remarkably  simple  framework  captured  by  a  single 
equation  (Equation  1).  Second,  the  manner  in  which 
the  results  were  derived  provides  the  means  for  unifying 
a  wide  range  of  other  previous  results,  thus  obtaining  a 
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Figure  8:  Re- projection  onto  the  tenth  frame.  Epipoles  are  computed  via  fundamental  matrix  (see  text)  using  the  implemen¬ 
tation  of  [21].  (a)  Large  base  situation  (structure  computed  between  first  and  fifth  frames):  average  error  1.62  with  std  of  1.2. 
(b)  Small  base-line  situation  (structure  computed  between  first  and  second  frames):  average  error  13.87  with  std  of  9.47. 
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Figure  9:  Error  in  re-projection  onto  the  intermediate  frames  (2-9).  Structure  was  computed  between  frames  one  and  ten. 
(a)  average  error  in  pixels,  (b)  relative  error  normalized  by  the  displacement  between  corresponding  points. 
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canonical  framework.  Following  Theorem  2.  the  corollar¬ 
ies  show  how  tins  ''middle  gronnd"  reduces  back  to  full 
afTine  structure  and  extends  into  full  ]>rojective  struc¬ 
ture  (C'Orollaries  1  and  d).  Tlie  corollaries  also  show 
how  tlie  <^^1  infinity’*  is  easil}-  manijuilated  in  this 

framework,  tliereby  making  further  connections  among 
projective  affine  and  Euclidean  results  in  general  and  less 
general  situations  (Corollaries  2  and  3).  The  corollaries 
also  unify  tJie  various  results  related  to  the  epi]:>olar  ge¬ 
ometry  of  two  views:  the  Essential  matrix  of  [19].  the 
Fundamental  matrix  of  [7]  and  other  related  results  of 

[13]  (C-orollary  5).  All  the  above  connections  and  re¬ 
sults  are  often  obtained  as  a  single-line  proof  and  follow 
naturally  from  the  relative  affine  framework. 

Finally,  the  relative  affine  result  has  proven  useful 
for  derivation  of  other  results  and  applicat  ions,  some  of 
which  can  be  found  in  [39,  37,  35].  The  derivation  of 
those  results  critically  rely  on  the  simjdicity  of  the  rela¬ 
tive  affine  framework,  and  in  some  cases  [37,  35]  on  the 
sharpness  of  tlie  framework  compared  to  tlie  projective 
framework. 
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